Scalable Inference for Logistic-Normal Topic Models
نویسندگان
چکیده
Logistic-normal topic models can effectively discover correlation structures among latent topics. However, their inference remains a challenge because of the non-conjugacy between the logistic-normal prior and multinomial topic mixing proportions. Existing algorithms either make restricting mean-field assumptions or are not scalable to large-scale applications. This paper presents a partially collapsed Gibbs sampling algorithm that approaches the provably correct distribution by exploring the ideas of data augmentation. To improve time efficiency, we further present a parallel implementation that can deal with large-scale applications and learn the correlation structures of thousands of topics from millions of documents. Extensive empirical results demonstrate the promise.
منابع مشابه
On Topic Evolution
I introduce topic evolution models for longitudinal epochs of word documents. The models employ marginally dependent latent state-space models for evolving topic proportion distributions and topicspecific word distributions; and either a logistic-normal-multinomial or a logistic-normal-Poisson model for document likelihood. These models allow posterior inference of latent topic themes over time...
متن کاملGibbs Sampling for Logistic Normal Topic Models with Graph-Based Priors
Previous work on probabilistic topic models has either focused on models with relatively simple conjugate priors that support Gibbs sampling or models with non-conjugate priors that typically require variational inference. Gibbs sampling is more accurate than variational inference and better supports the construction of composite models. We present a method for Gibbs sampling in non-conjugate l...
متن کاملCorrelated Topic Models
Topic models, such as latent Dirichlet allocation (LDA), have been an effective tool for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a distribution over the vocabulary. A limitation of LDA is the inability to model topic correlation even though, for example, a doc...
متن کاملThe Discrete Infinite Logistic Normal Distribution for Mixed-Membership Modeling
We present the discrete infinite logistic normal distribution (DILN, “Dylan”), a Bayesian nonparametric prior for mixed membership models. DILN is a generalization of the hierarchical Dirichlet process (HDP) that models correlation structure between the weights of the atoms at the group level. We derive a representation of DILN as a normalized collection of gamma-distributed random variables, a...
متن کاملOn Tight Approximate Inference of the Logistic-Normal Topic Admixture Model
The Logistic-Normal Topic Admixture Model (LoNTAM), also known as correlated topic model (Blei and Lafferty, 2005), is a promising and expressive admixture-based text model. It can capture topic correlations via the use of a logistic-normal distribution to model non-trivial variabilities in the topic mixing vectors underlying documents. However, the non-conjugacy caused by the logistic-normal m...
متن کامل